Open-source DeepResearch – Freeing our search agents - nikkie-memos

Open-source DeepResearch – Freeing our search agents

https://huggingface.co/blog/open-deep-research

Introducing deep research (OpenAI)を受けて24時間チャレンジ

ベンチマーク General AI Assistants benchmark (GAIA)

https://github.com/huggingface/smolagents/tree/c41a50a0dbdebbdbb3e5a939c790184021b2f870/examples/open_deep_research (ref: Results 🏅)

テキストブラウザで実装したとのこと（ref: Making the right tools 🛠️）

関係ある？ https://github.com/huggingface/smolagents/pull/317

What are Agent frameworks and why they matter?

What's next for AI agentic workflows ft. Andrew Ng of AI Fund

Introducing smolagents, a simple library to build agents

（TODO 指標について飛ばした）

Building an open Deep Research

Using a CodeAgent

Executable Code Actions Elicit Better LLM Agents

アクションをコードで表現するエージェント

aymeric-roucher/agent_reasoning_benchmark

Making the right tools 🛠️

1. A web browser

we started with an extremely simple text-based web browser for now for our first proof-of-concept

SimpleTextBrowser smolagents/examples/open_deep_research/scripts/text_web_browser.py

2. A simple text inspector

https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/scripts/text_inspector_tool.py

多様な拡張子に対応したファイルを読み込むツールぽい

These tools were taken from the excellent Magentic-One agent by Microsoft Research,

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Results 🏅

We’ve quickly gone up from the previous SoTA with an open framework, around 46% for Magentic-One, to our current performance of 55.15% on the validation set.

CodeAgentによると考察している

when switching to a standard agent that writes actions in JSON instead of code, performance of the same setup is instantly degraded to 33% average on the validation set.